Research Article | Open Access
Volume 2025 |Article ID 100048 | https://doi.org/10.1016/j.plaphe.2025.100048

TPDNet: Triple phenotype deepen networks for monocular 3D object detection of melons and fruits in fields

Yazhou Wang,1,3 Tianhan Zhang,1,3 Xingcai Wu,2 Qinglei Li,2 Yuquan Li,2 and Qi Wang 2

1School of Information, Guizhou University of Finance and Economics, Guiyang, 550025, China
2State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang, 550025, China
3The two authors contribute equally to this work.

Received 
07 Nov 2024
Accepted 
21 Apr 2025
Published
30 May 2025

Abstract

The growth of the global population has increased the demand for fruits and vegetables, while high harvesting labor costs severely constrain industry development. Currently, relevant personnel primarily utilize 2D object detection technology to facilitate automated harvesting, aiming to reduce labor costs. However, 2D detection technology is limited to providing planar information and cannot meet the requirements of scenarios that need 3D spatial data, whereas 3D object detection technology can effectively address these needs, including point cloud-based methods and monocular-based methods. Since point cloud-based object detection methods require expensive equipment, they are not suitable for low-cost agricultural harvesting scenarios. In contrast, monocular 3D object detection methods have the advantage of only requiring a camera and being easy to deploy. However, there is a lack of specialized monocular 3D object detection datasets and algorithms suited for natural scenes in the agricultural field, which limits the application and development of this technology in agricultural automation. To address this, we construct a 3D object detection dataset for wax gourds and propose a network called TPDNet, which aims to capture the 3D information of objects from a single RGB image for fruits and vegetables in fields. Specifically, we construct a depth estimation and enhance module that introduces depth information into the model with the help of depth auxiliary labels, and improves the representation of depth information by utilizing weight information across spatial and channel dimensions. Meanwhile, since depth features and image features are heterogeneous, we design the phenotype aggregation and phenotype intensify module to capture the correspondence between image and depth features, promoting the effective fusion of image and depth information. The experimental results show that our method significantly outperforms others in terms of mAP3D and mAPBEV metrics, demonstrating the effectiveness and validity of our proposed method. We open our code and dataset at: http://tpdnet.samlab.cn.

© 2019-2023   Plant Phenomics. All rights Reserved.  ISSN 2643-6515.

Back to top